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Summary 

In  total,  108  compounds  have  been  used  for  enrichment  culture  and  85  compounds  produced 
colonies  (when  used  as  the  sole  source  of  carbon  and  energy).  These  have  been  sequenced  and 
the  genomes  annotated.  Analysis  resulted  in  the  identification  of  38  genomes  with  candidate 
transcription  factors  that  likely  respond  to  one  of  the  108  chemicals.  Constructs  for  cloning  and 
evaluating  transcription  factors  (over  an  improved  dynamic  range)  were  completed  previously. 
We  are  now  in  a  position  to  empirically  identify  transcription  factors  that  can  report  on  compound 
biosynthesis. 

Introduction 


The  overall  goal  in  this  contract  is  to  link  cell-based  production  to  cell  survival  and  thereby  make 
the  engineering  of  new  microbial  strains  that  produce  industrially  relevant  biochemicals  routine. 
Recent  synthetic  biology  techniques  can  make  billions  of  variant  cells.  Although,  many  potentially 
informative  mutants  are  easily  made,  product  yield  can  only  be  determined  in  a  few  of  these.  The 
majority  of  industrially  relevant  biomolecules  are  not  chromophores,  naturally  discernible,  or 
otherwise  easily  detected.  Nevertheless,  genetic  circuits  are  capable  of  linking  chemical 
production  to  discernible  signals  such  as  growth  or  color  intensity.  Such  a  system  would  allow 
numerous  mutants  and  mutant  combinations  to  be  examined  quickly.  Genetic  circuits  needed  to 
screen  mutant  populations  in  parallel  rely  upon  the  availability  of  an  appropriate  biosensor  that 
activates  a  reporter  gene  in  a  product  dependent  fashion.  These  are  not  routinely  available.  In 
this  project,  genes  for  two-component  and  one-component  signaling  systems  (that  respond  to 
industrially  relevant  biomolecules)  are  identified.  To  demonstrate  that  such  sensors  can  be  used 
to  maximize  product  yield,  one  sensing  system  will  be  further  engineered.  We  will  reformat  this 
sensor  so  that  it  drives  expression  of  a  reporter  such  as  an  antibiotic  resistance  marker.  This 
sensor/resistance  cassette,  and  a  biosynthetic  pathway  capable  of  producing  the  molecule  to 
which  the  sensor  responds,  will  be  placed  within  a  heterologous  host  that  does  not  have  an 
overlapping  pathway.  Basal  synthesis  of  the  targeted  chemical  (by  the  orthogonal  biosynthetic 
pathway)  activates  the  sensor  and  increases  transcription  of  the  resistance  marker  (i.e.  reporter). 
In  other  words,  the  fermentation  product  is  also  the  sensor  ligand  and  thus,  biosynthesis  drives 
production  of  the  reporter  and  a  discernable  cell  phenotype.  Targeted,  genome-wide  and 
barcoded  alterations  to  the  host  genome  will  then  be  installed.  Variants  with  better  and  better 
chemical  production  are  selected  by  virtue  of  increased  reporter  activity. 

Methods,  Assumptions  and  Procedures 

To  identify  candidate  transcription  factors  for  experimental  evaluation  we  previously  processed  13 
sequenced  genomes  and  have  now  completed  the  task.  A  BLAST  database  of  all  biodegradation 
gene  clusters  that  we  could  identify  in  sequencing  repositories  was  initially  made.  Best  ‘hits’  for 
the  experimentally  isolated  colonies  were  identified  by  querying  the  database  with  each  genomes 
annotated  open  reading  frames  (ORFs).  A  stringent  cutoff  score  was  used  (1e-80)  so  that  only 
ORFs  very  similar  to  experimentally  investigated  biodegradation  enzymes  were  labeled.  Potential 
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degradation  pathways  were  collected  by  parsing  the  output  and  selecting  regions  where  two  or 
more  putative  degradation  enzymes  occurred  within  a  sliding  window  of  10  ORFs  and  were  co¬ 
located  with  a  one  component,  two  component,  or  TonB  sensor  system.  This  approach  mimics 
the  gene-mapping  algorithm  developed  for  marking  biosynthetic  gene  clusters;  however,  in  this 
case,  catabolic  genes  are  used  to  map  potential  degradation  pathways.  One  assumption  that 
limits  the  effectiveness  of  this  approach  is  that  the  genes  encoding  a  catabolic  cluster  are  co¬ 
located  (both  naturally  and  with  respect  to  the  10’s  or  100’s  of  assembled  sequence  fragments 
that  represent  a  ‘next-generation’  genome).  Typically,  catabolic  enzymes  for  a  particular  pathway 
are  co-located.  Likewise,  incomplete  assembly  of  genomes  (i.e.  fragmentation)  typically  occurs  at 
long  repetitive  sequences  such  as  ribosomal  operons  and  insertion  elements,  etc.  Repetitive 
regions  are  rare  within  the  boundaries  of  catabolic  clusters.  Nevertheless,  half  of  the  genomes  did 
not  ultimately  yield  a  candidate  catabolic  cluster  likely  able  to  utilize  the  corresponding  chemical. 
We  will  explore  other  funding  opportunities  to  eliminate  and  more  fully  understand  the  nature  of 
this  shortcoming. 

Although  manual  inspection  of  candidate  gene  clusters  was  adequate  for  processing  the  13 
genomes  from  the  preliminary  phase,  a  brute-force  approach  proved  too  cumbersome  with  the 
full  dataset.  This  issue  was  addressed  by  focusing  on  pathways  that  were  appropriate  for  a 
specified  chemical.  All  carbon  sources  must  ultimately  supply  material  to  central  metabolism. 
Thus,  the  structure  of  an  initial  carbon  source  constrains  probable  intermediates  as  they  are 
transformed  and  funneled  into  central  metabolism.  We  thus  used  elucidated  degradation 
pathways  and  predictions  from  the  University  of  Minnesota  Biocatalysis/Biodegradation  Database 
to  identify  whether  our  target  chemicals  likely  utilized  known  steps  (i.e.  enzymes)  downstream  of 
initial  processing.  Each  genome  was  then  probed  with  enzymes  from  the  protocatechuate, 
benzoate,  catechol,  or  other  reference  pathway  as  dictated  by  the  structure  of  the  parent 
compound.  This  yielded  1-3  high-quality  candidate  clusters  for  nearly  40  chemicals.  In  the  next 
phase,  these  candidates  will  be  assembled  into  the  reporter  system  and  experimentally  validated. 


Results  and  Discussion 


Screening  of  chemicals,  processing  of  the  resulting  microbes,  and  construction  of  necessary 
plasmids  etc.,  was  completed  previously.  We  also  improved  the  dynamic  range  of  the  reporter 
system  and  decreased  the  labor  necessary  to  characterize  transcription-factor  candidates  by 
employing  an  automated  system  for  recording  results.  Candidate  transcription  factors  and  their 
corresponding  operons  are  now  ready  to  be  reformatted,  assembled  and  tested  for  induction  by 
an  exogenously  supplied  effector.  This  E.  coli-based  platform  is  now  being  screened  with  each  of 
the  relevant  chemicals  to  identify  potential  issues  such  as  toxicity  or  background  growth  of  the 
host.  It  is  likely  that  a  few  more  targets  will  fail  at  this  stage.  If  necessary,  such  issues  will  be 
addressed.  Similarly,  some  of  the  targets  will  be  at  least  temporarily  set-aside  because  they 
require  transporters,  etc.,  to  be  cloned  in  addition  to  the  relevant  transcription  factor.  After  the  E. 
coli  screens  are  completed  we  expect  that  about  20  candidates  will  be  available  for  testing 
induction. 


Conclusions 


The  results  indicate  that  a  chemical  made  by  one  organism  is  likely  to  be  used  as  food  by  some 
other  microbe.  Bacteria  typically  utilize  the  most  efficient  carbon  source  available  (glucose  often 
being  the  preferred  substrate).  More  exotic  carbon  sources  are  generally  subject  to  catabolite 
repression  and  systems  for  their  utilization  are  activated  after  preferred  carbon  sources  are 
exhausted.  Besides  catabolite  repression,  sensors  are  often  employed  so  that  the  appropriate 
degradation  pathway  for  a  non-preferred  carbon  source  is  activated.  Our  sequencing  results  have 
identified  organisms  rich  in  transcription-factor  based  sensors  that  are  integrated  with  appropriate 
catabolic  gene  clusters.  With  the  technology  employed  approximately  20%  of  a  diverse  set  of 
target  chemicals  yields  readily  accessible  biosensor  candidates.  With  improvements  in 

3 


Rapid  parallel  screening  for  strain  optimization 

(HR001 1-12-C-0062) 


sequencing  technology  and  declining  costs,  we  suspect  that  the  yield  could  improve  in  the  near 
term.  For  example,  the  largest  step-down  in  candidates  worthy  of  promotion  occurred  during  the 
identification  of  appropriate  catabolic  gene  clusters.  Recently,  several  publications  and  JCVI 
testing  of  the  PACBIO  next-generation  sequencing  platform  indicated  that  assembling  mostly 
unfragmented  microbial  genomes  is  now  possible  in  a  single  run.  Together  with  advances  in 
RNAseq  and  constantly  improving  bioinformatics  as  more  catabolic  pathways  are  rigorously 
defined,  future  large-scale  screens  will  likely  improve  significantly. 

Statement  of  Work  Task  List: 

•  Task  1  (Phase  I,  Year  1,  Months  0-3):  Completed  (please  refer  to  report  HR001 1-12-C-2.1) 

•  Task  2  (Phase  I,  Year  1 ,  Months  4-9):  Completed.  Sixty-five  isolates  have  been  sequenced. 

•  Task  3  (Phase  I,  Year  1,  Months  10-12):  Completed.  Selected  microbes  have  been 
sequenced,  and  annotated. 

•  Task  4,  (Phase  I,  Year  1,  months  12).  Completed.  Ranking  of  transcription  factors  for 
evaluation  was  delayed  by  delays  in  sequencing  genomes. 

•  Task  5  (Phase  II,  Year  2,  Months  13-18):  Initiated  and  optimized.  Construction  and  testing  of 
the  reporter  system  has  been  completed  and  an  automated  process  was  produced  during 
sequencing  delays.  We  now  expect  to  be  able  to  process  more  than  the  5-10  original 
candidates. 

Planned  Activities  for  the  Next  Reporting  Period 

During  the  next  reporting  period  we  will  process  candidate  transcription  factors  and  identify  those 
immediately  suitable  for  the  metabolic  engineering  phase  of  the  project. 
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Based  on  the  currently  authorized  work: 

•  Is  current  funding  sufficient  for  the  current  fiscal  year?  Yes 
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